What is center for policing equity?

The Center of Policing Equity is an organisation that brings together a mixed group of professionals, including research scientists, equity experts, data analysts, and community trainers. Their common goal is to create unbiased and more rational systems for all individuals. If you’d like to learn more about their work, please visit their website at https://policingequity.org.

Dataset Summary

The dataset we have is a subset of a large collection of data gathered from police departments throughout the United States. It includes various records related to police behavior such as the use of force, stops of vehicles and pedestrians, crime data, and incident details such as date and time. The dataset we have is limited to Dallas, Texas in the year 2016.

Loading all the required libraries
library(ggplot2)
library(ggforce)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(scales)
library(RColorBrewer)
library(viridis)
## Loading required package: viridisLite
## 
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
## 
##     viridis_pal
library(leaflet)
library(leaflet.extras)
library(htmlwidgets)
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ tibble  3.1.8     ✔ purrr   0.3.5
## ✔ tidyr   1.2.1     ✔ stringr 1.4.1
## ✔ readr   2.1.3     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ readr::col_factor() masks scales::col_factor()
## ✖ purrr::discard()    masks scales::discard()
## ✖ dplyr::filter()     masks plotly::filter(), stats::filter()
## ✖ dplyr::lag()        masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(maps)
## 
## Attaching package: 'maps'
## 
## The following object is masked from 'package:purrr':
## 
##     map
## 
## The following object is masked from 'package:viridis':
## 
##     unemp
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
library(treemapify)
library(calendR)
## ~~ Package calendR
## Visit https://r-coder.com/ for R tutorials ~~
library(ggpubr)

Data Pre-processing

The data pre-processing stage is mainly focused on understanding the types of columns in the data and the structure of the dataset. The null values in the data are also replaced with suitable string, “Record Missing”. Removing rows or columns with null values may not be appropriate as the null values could indicate a lack of data for a particular category or a completely different type of crime that does not have data for that column. Therefore, it is important to retain the null values, but label them as “not available (data)”.

# Load the data
data <- read.csv("37-00049_UOF-P_2016_prepped.csv")
# Removing first row as it has column names only
data = data[-1,]
attach(data)
head(data)
##   INCIDENT_DATE INCIDENT_TIME    UOF_NUMBER OFFICER_ID OFFICER_GENDER
## 2        9/3/16    4:14:00 AM         37702      10810           Male
## 3       3/22/16   11:00:00 PM         33413       7706           Male
## 4       5/22/16    1:29:00 PM         34567      11014           Male
## 5       1/10/16    8:55:00 PM         31460       6692           Male
## 6       11/8/16    2:30:00 AM  37879, 37898       9844           Male
## 7       9/11/16    7:20:00 PM         36724       9855           Male
##   OFFICER_RACE OFFICER_HIRE_DATE OFFICER_YEARS_ON_FORCE OFFICER_INJURY
## 2        Black            5/7/14                      2             No
## 3        White            1/8/99                     17            Yes
## 4        Black           5/20/15                      1             No
## 5        Black           7/29/91                     24             No
## 6        White           10/4/09                      7             No
## 7        White           6/10/09                      7             No
##            OFFICER_INJURY_TYPE OFFICER_HOSPITALIZATION SUBJECT_ID SUBJECT_RACE
## 2 No injuries noted or visible                      No      46424        Black
## 3                Sprain/Strain                     Yes      44324     Hispanic
## 4 No injuries noted or visible                      No      45126     Hispanic
## 5 No injuries noted or visible                      No      43150     Hispanic
## 6 No injuries noted or visible                      No      47307        Black
## 7 No injuries noted or visible                      No      46549        White
##   SUBJECT_GENDER SUBJECT_INJURY          SUBJECT_INJURY_TYPE
## 2         Female            Yes      Non-Visible Injury/Pain
## 3           Male             No No injuries noted or visible
## 4           Male             No No injuries noted or visible
## 5           Male            Yes               Laceration/Cut
## 6           Male             No No injuries noted or visible
## 7         Female             No No injuries noted or visible
##   SUBJECT_WAS_ARRESTED SUBJECT_DESCRIPTION          SUBJECT_OFFENSE
## 2                  Yes   Mentally unstable                    APOWW
## 3                  Yes   Mentally unstable                    APOWW
## 4                  Yes             Unknown                    APOWW
## 5                  Yes FD-Unknown if Armed           Evading Arrest
## 6                  Yes             Unknown Other Misdemeanor Arrest
## 7                  Yes             Unknown               Assault/FV
##   REPORTING_AREA BEAT SECTOR      DIVISION LOCATION_DISTRICT STREET_NUMBER
## 2           2062  134    130       CENTRAL               D14           211
## 3           1197  237    230     NORTHEAST                D9          7647
## 4           4153  432    430     SOUTHWEST                D6           716
## 5           4523  641    640 NORTH CENTRAL               D11          5600
## 6           2167  346    340     SOUTHEAST                D7          4600
## 7           1134  235    230     NORTHEAST                D9          1234
##    STREET_NAME STREET_DIRECTION STREET_TYPE
## 2        Ervay                N         St.
## 3     Ferguson             NULL         Rd.
## 4 bimebella dr             NULL         Ln.
## 5          LBJ             NULL       Frwy.
## 6    Malcolm X                S       Blvd.
## 7        Peavy             NULL         Rd.
##   LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION LOCATION_CITY LOCATION_STATE
## 2                               211 N ERVAY ST        Dallas             TX
## 3                             7647 FERGUSON RD        Dallas             TX
## 4                             716 BIMEBELLA LN        Dallas             TX
## 5                               5600 L B J FWY        Dallas             TX
## 6                        4600 S MALCOLM X BLVD        Dallas             TX
## 7                                1234 PEAVY RD        Dallas             TX
##   LOCATION_LATITUDE LOCATION_LONGITUDE INCIDENT_REASON REASON_FOR_FORCE
## 2         32.782205         -96.797461          Arrest           Arrest
## 3         32.798978         -96.717493          Arrest           Arrest
## 4          32.73971          -96.92519          Arrest           Arrest
## 5                                               Arrest           Arrest
## 6                                               Arrest           Arrest
## 7         32.837527         -96.695566          Arrest           Arrest
##      TYPE_OF_FORCE_USED1 TYPE_OF_FORCE_USED2 TYPE_OF_FORCE_USED3
## 2  Hand/Arm/Elbow Strike                                        
## 3            Joint Locks                                        
## 4      Take Down - Group                                        
## 5         K-9 Deployment                                        
## 6         Verbal Command     Take Down - Arm                    
## 7 Hand Controlled Escort                                        
##   TYPE_OF_FORCE_USED4 TYPE_OF_FORCE_USED5 TYPE_OF_FORCE_USED6
## 2                                                            
## 3                                                            
## 4                                                            
## 5                                                            
## 6                                                            
## 7                                                            
##   TYPE_OF_FORCE_USED7 TYPE_OF_FORCE_USED8 TYPE_OF_FORCE_USED9
## 2                                                            
## 3                                                            
## 4                                                            
## 5                                                            
## 6                                                            
## 7                                                            
##   TYPE_OF_FORCE_USED10 NUMBER_EC_CYCLES FORCE_EFFECTIVE
## 2                                  NULL             Yes
## 3                                  NULL             Yes
## 4                                  NULL             Yes
## 5                                  NULL             Yes
## 6                                  NULL         No, Yes
## 7                                  NULL             Yes
summary(data)
##  INCIDENT_DATE      INCIDENT_TIME       UOF_NUMBER         OFFICER_ID       
##  Length:2383        Length:2383        Length:2383        Length:2383       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  OFFICER_GENDER     OFFICER_RACE       OFFICER_HIRE_DATE 
##  Length:2383        Length:2383        Length:2383       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  OFFICER_YEARS_ON_FORCE OFFICER_INJURY     OFFICER_INJURY_TYPE
##  Length:2383            Length:2383        Length:2383        
##  Class :character       Class :character   Class :character   
##  Mode  :character       Mode  :character   Mode  :character   
##  OFFICER_HOSPITALIZATION  SUBJECT_ID        SUBJECT_RACE      
##  Length:2383             Length:2383        Length:2383       
##  Class :character        Class :character   Class :character  
##  Mode  :character        Mode  :character   Mode  :character  
##  SUBJECT_GENDER     SUBJECT_INJURY     SUBJECT_INJURY_TYPE SUBJECT_WAS_ARRESTED
##  Length:2383        Length:2383        Length:2383         Length:2383         
##  Class :character   Class :character   Class :character    Class :character    
##  Mode  :character   Mode  :character   Mode  :character    Mode  :character    
##  SUBJECT_DESCRIPTION SUBJECT_OFFENSE    REPORTING_AREA         BEAT          
##  Length:2383         Length:2383        Length:2383        Length:2383       
##  Class :character    Class :character   Class :character   Class :character  
##  Mode  :character    Mode  :character   Mode  :character   Mode  :character  
##     SECTOR            DIVISION         LOCATION_DISTRICT  STREET_NUMBER     
##  Length:2383        Length:2383        Length:2383        Length:2383       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  STREET_NAME        STREET_DIRECTION   STREET_TYPE       
##  Length:2383        Length:2383        Length:2383       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION LOCATION_CITY     
##  Length:2383                                  Length:2383       
##  Class :character                             Class :character  
##  Mode  :character                             Mode  :character  
##  LOCATION_STATE     LOCATION_LATITUDE  LOCATION_LONGITUDE INCIDENT_REASON   
##  Length:2383        Length:2383        Length:2383        Length:2383       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  REASON_FOR_FORCE   TYPE_OF_FORCE_USED1 TYPE_OF_FORCE_USED2 TYPE_OF_FORCE_USED3
##  Length:2383        Length:2383         Length:2383         Length:2383        
##  Class :character   Class :character    Class :character    Class :character   
##  Mode  :character   Mode  :character    Mode  :character    Mode  :character   
##  TYPE_OF_FORCE_USED4 TYPE_OF_FORCE_USED5 TYPE_OF_FORCE_USED6
##  Length:2383         Length:2383         Length:2383        
##  Class :character    Class :character    Class :character   
##  Mode  :character    Mode  :character    Mode  :character   
##  TYPE_OF_FORCE_USED7 TYPE_OF_FORCE_USED8 TYPE_OF_FORCE_USED9
##  Length:2383         Length:2383         Length:2383        
##  Class :character    Class :character    Class :character   
##  Mode  :character    Mode  :character    Mode  :character   
##  TYPE_OF_FORCE_USED10 NUMBER_EC_CYCLES   FORCE_EFFECTIVE   
##  Length:2383          Length:2383        Length:2383       
##  Class :character     Class :character   Class :character  
##  Mode  :character     Mode  :character   Mode  :character
# Dataset structure
str(data)
## 'data.frame':    2383 obs. of  47 variables:
##  $ INCIDENT_DATE                               : chr  "9/3/16" "3/22/16" "5/22/16" "1/10/16" ...
##  $ INCIDENT_TIME                               : chr  "4:14:00 AM" "11:00:00 PM" "1:29:00 PM" "8:55:00 PM" ...
##  $ UOF_NUMBER                                  : chr  "37702" "33413" "34567" "31460" ...
##  $ OFFICER_ID                                  : chr  "10810" "7706" "11014" "6692" ...
##  $ OFFICER_GENDER                              : chr  "Male" "Male" "Male" "Male" ...
##  $ OFFICER_RACE                                : chr  "Black" "White" "Black" "Black" ...
##  $ OFFICER_HIRE_DATE                           : chr  "5/7/14" "1/8/99" "5/20/15" "7/29/91" ...
##  $ OFFICER_YEARS_ON_FORCE                      : chr  "2" "17" "1" "24" ...
##  $ OFFICER_INJURY                              : chr  "No" "Yes" "No" "No" ...
##  $ OFFICER_INJURY_TYPE                         : chr  "No injuries noted or visible" "Sprain/Strain" "No injuries noted or visible" "No injuries noted or visible" ...
##  $ OFFICER_HOSPITALIZATION                     : chr  "No" "Yes" "No" "No" ...
##  $ SUBJECT_ID                                  : chr  "46424" "44324" "45126" "43150" ...
##  $ SUBJECT_RACE                                : chr  "Black" "Hispanic" "Hispanic" "Hispanic" ...
##  $ SUBJECT_GENDER                              : chr  "Female" "Male" "Male" "Male" ...
##  $ SUBJECT_INJURY                              : chr  "Yes" "No" "No" "Yes" ...
##  $ SUBJECT_INJURY_TYPE                         : chr  "Non-Visible Injury/Pain" "No injuries noted or visible" "No injuries noted or visible" "Laceration/Cut" ...
##  $ SUBJECT_WAS_ARRESTED                        : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ SUBJECT_DESCRIPTION                         : chr  "Mentally unstable" "Mentally unstable" "Unknown" "FD-Unknown if Armed" ...
##  $ SUBJECT_OFFENSE                             : chr  "APOWW" "APOWW" "APOWW" "Evading Arrest" ...
##  $ REPORTING_AREA                              : chr  "2062" "1197" "4153" "4523" ...
##  $ BEAT                                        : chr  "134" "237" "432" "641" ...
##  $ SECTOR                                      : chr  "130" "230" "430" "640" ...
##  $ DIVISION                                    : chr  "CENTRAL" "NORTHEAST" "SOUTHWEST" "NORTH CENTRAL" ...
##  $ LOCATION_DISTRICT                           : chr  "D14" "D9" "D6" "D11" ...
##  $ STREET_NUMBER                               : chr  "211" "7647" "716" "5600" ...
##  $ STREET_NAME                                 : chr  "Ervay" "Ferguson" "bimebella dr" "LBJ" ...
##  $ STREET_DIRECTION                            : chr  "N" "NULL" "NULL" "NULL" ...
##  $ STREET_TYPE                                 : chr  "St." "Rd." "Ln." "Frwy." ...
##  $ LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION: chr  "211 N ERVAY ST" "7647 FERGUSON RD" "716 BIMEBELLA LN" "5600 L B J FWY" ...
##  $ LOCATION_CITY                               : chr  "Dallas" "Dallas" "Dallas" "Dallas" ...
##  $ LOCATION_STATE                              : chr  "TX" "TX" "TX" "TX" ...
##  $ LOCATION_LATITUDE                           : chr  "32.782205" "32.798978" "32.73971" "" ...
##  $ LOCATION_LONGITUDE                          : chr  "-96.797461" "-96.717493" "-96.92519" "" ...
##  $ INCIDENT_REASON                             : chr  "Arrest" "Arrest" "Arrest" "Arrest" ...
##  $ REASON_FOR_FORCE                            : chr  "Arrest" "Arrest" "Arrest" "Arrest" ...
##  $ TYPE_OF_FORCE_USED1                         : chr  "Hand/Arm/Elbow Strike" "Joint Locks" "Take Down - Group" "K-9 Deployment" ...
##  $ TYPE_OF_FORCE_USED2                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED3                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED4                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED5                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED6                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED7                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED8                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED9                         : chr  "" "" "" "" ...
##  $ TYPE_OF_FORCE_USED10                        : chr  "" "" "" "" ...
##  $ NUMBER_EC_CYCLES                            : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ FORCE_EFFECTIVE                             : chr  " Yes" " Yes" " Yes" " Yes" ...
#checking data for missing values
data <- data %>% dplyr::mutate_all(~ifelse(.=="NULL", "Records Missing", .))
data <- data %>% dplyr::mutate_all(~ifelse(.=="Unknown", "Records Missing", .))

Data Features

This data pertains to policing equity in Dallas, Texas in 2016. It consists of 2383 rows and spans across 47 columns. My goal is to analyze the dataset and investigate whether there is a racial connection to the crimes that occurred in Dallas. Specifically, I will examine whether the criminals belong to the same race or different races. To help you understand the dataset better, I have created a table that highlights the main entities and their corresponding features.

Object Features in Dataset
Incident Date, Time, Reason
Officer Id, Gender, Race, Injury, Injury type, Years on force etc.
Subject Id, Gender, Race, Injury, Injury type, Arrested, Offense etc.
Street Number, Name, Direction
Location Reporting area, Division, District, Latitude, Longitude

Criminal Activity Reported in Dallas 2016

# Visualisation 1: Crime reported based on different race of the individuals

# Define a color gradient based on the height of the bars
changing_colors <- colorRamp(c("#FF9933", "#F44339"))

data %>%
  filter(!is.na(SUBJECT_RACE)) %>%
  group_by(SUBJECT_RACE) %>%
  summarise(Count = n()) %>%
  mutate(TotalCount = nrow(data)) %>%
  mutate(Percentage = (Count/TotalCount) * 100) %>%
  arrange(desc(Count)) %>%
  ungroup() %>%
  mutate(SUBJECT_RACE = reorder(SUBJECT_RACE, Count)) %>%
  plot_ly(x = ~SUBJECT_RACE, y = ~Percentage, type = 'bar', 
          name = "Criminals Reported By Race", 
          marker = list(color = ~Percentage, colorscale = 'changing_colors')) %>%
  layout(xaxis = list(title = "Criminal Race", tickangle = 0, showticklabels = TRUE),
         yaxis = list(title = "Percentage", tickformat = ".2f"),
         margin = list(l = 60, r = 10, b = 60, t = 30),
         title = list(text = "Percentage of Reported Criminals Based on Race", x = 0.5, y = 0.95, font = list(size = 20)))

The plot clearly indicates that individuals of the Black race are reported more frequently than any other race for criminal activity, accounting for more than 50% of the data. The other two race groups reported for crime are Hispanic and White, in that order. The remaining subject races tend to have very little reported crime data.

#Visualisation 2: Count of reported subjects by gender

# count number of subjects by gender
subject_gender_counts <- data %>%
  filter(!is.na(SUBJECT_GENDER)) %>%
  group_by(SUBJECT_GENDER) %>%
  summarise(count = n())

pie_chart <- plot_ly(
  labels = subject_gender_counts$SUBJECT_GENDER,
  values = subject_gender_counts$count,
  type = "pie",
  textposition = "inside",
  textinfo = "label+percent",
  insidetextorientation = "horizontal",
  marker = list(
    colors = c("#1f77b4", "#ff7f0e", "#2ca02c"),
    line = list(color = "#FFFFFF", width = 1)
  )
) %>%
  layout(
    title = list(
      text = "Count of Reported Subjects by Gender",
      font = list(size = 20)
    ),
    margin = list(l = 60, r = 60, b = 60, t = 60),
    legend = list(
      x = 0.5, 
      y = -0.2,
      bgcolor = "white",
      bordercolor = "black",
      borderwidth = 1,
      orientation = "h"
    )
  )

pie_chart
#Visualisation 3: Genders reported for crimes in each subject race 

# Create a summary of subject race and arrest status counts
subject_race_arrested <- data %>%
  filter(!is.na(SUBJECT_RACE), !is.na(SUBJECT_GENDER)) %>%
  group_by(SUBJECT_RACE, SUBJECT_GENDER) %>%
  summarise(count = n())
## `summarise()` has grouped output by 'SUBJECT_RACE'. You can override using the
## `.groups` argument.
bar_chart <- plot_ly(subject_race_arrested, x = ~SUBJECT_RACE, y = ~count, color = ~SUBJECT_GENDER, type = "bar") %>%
  layout(
    title = list(
      text = "Count of Arrests by Subject Gender and Race",
      font = list(size = 20)
    ),
    xaxis = list(title = "Subject Race", tickangle = 0, gap = 0.3),
    yaxis = list(title = "Count"),
    showlegend = TRUE,
    bargap = 0.2
  )

bar_chart

The pie chart clearly illustrates the distribution of reported subjects for criminal activity by gender, with women accounting for only 18.5% and men dominating with a significant majority of 81.1%. Furthermore, there is a barplot illustrating the distribution of male and female criminals across different races. Once again, the black male and female populations are found to be the most dominant among the different races. Clearly showing that both male and female are subject to more criminal activities in their respective gender category as well.

Analysing common races reported/arrested

#Visualisation 4: Count of Subject Arrested by Race

# Create a summary of subject race and arrest status counts
subject_race_arrested <- data %>%
  filter(!is.na(SUBJECT_RACE), !is.na(SUBJECT_WAS_ARRESTED)) %>%
  group_by(SUBJECT_RACE, SUBJECT_WAS_ARRESTED) %>%
  summarise(count = n())
## `summarise()` has grouped output by 'SUBJECT_RACE'. You can override using the
## `.groups` argument.
# Create a bar chart
bar_chart <- plot_ly(subject_race_arrested, x = ~SUBJECT_RACE, y = ~count, color = ~SUBJECT_WAS_ARRESTED, type = "bar") %>%
  layout(
    title = list(
      text = "Count of Arrests by Subject Race",
      font = list(size = 20)
    ),
    xaxis = list(title = "Subject Race"),
    yaxis = list(title = "Count"),
    showlegend = TRUE
  )

bar_chart
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
#Visualisation 5: Focus on major reported criminal races 
incidents_race <- data[SUBJECT_RACE %in% c('Black', 'Hispanic', 'White'),]

indicents <- ggplot(incidents_race, aes(x = INCIDENT_REASON, fill = SUBJECT_RACE)) + 
  geom_bar() + 
  labs(title = "Actions Taken Against Subjects by Race", x = "Action Types", y= "Total Count with Racial Facet") + 
  coord_flip() +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_fill_manual(values = c("#1b9e77", "#d95f02", "#7570b3")) # custom color palette

ggplotly(indicents)

As previously mentioned, individuals from a Black racial background are more likely to be reported for a crime. This observation is reinforced by the barplot that shows the count of arrests, which indicates that Black subjects are more likely to be arrested for a reported incident when compared to individuals from other reported races such as Hispanics and Whites.

Total count with racial facet” refers to a way of categorising or presenting data where the total number of occurrences (count) is broken down by racial or ethnic groups (facets). The horizontal barplot suggests the actions taken against the subjects of different race. This plot shows that the arrests and service calls are the two primary actions by the police of response in Dallas. And in both the responses for action taken, it primarily taken against the Black followed by Hispanic and White.

#Visualisation 6: Arrests count by different reasons
# Group the data by INCIDENT_REASON and count the number of occurrences

# Count the number of occurrences of each unique value in the INCIDENT_REASON column
incident_count <- data %>% 
  count(INCIDENT_REASON)

# Define a color palette using RColorBrewer
colors <- brewer.pal(n = nrow(incident_count), name = "Set2")
## Warning in brewer.pal(n = nrow(incident_count), name = "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
pie_chart <- plot_ly(incident_count, labels = ~INCIDENT_REASON, values = ~n, type = "pie",
                     marker = list(colors = "colors")) %>% 
  layout(title = "Arrests Count by Incident Reasons")

pie_chart

The plot presents a comprehensive visualisation of the primary methods employed for reporting a crime. The data indicates that the majority of incidents are reported through arrests, accounting for 48.6% of the total. Service calls are the second most common means of reporting, accounting for 28.2% of incidents. This information is critical for law enforcement to make informed decisions about resource allocation and crime prevention strategies.

# Visualisation 7: Incidents by Subject Description for Black, Hispanic and White Subjects
# Filter data to only include Hispanic, Black and White SUBJECT_RACE
filtered_data <- data %>% 
  filter(SUBJECT_RACE %in% c("Hispanic", "Black", "White"))
# Group data by SUBJECT_RACE and SUBJECT_DESCRIPTION
grouped_data <- filtered_data %>% 
  group_by(SUBJECT_RACE, SUBJECT_DESCRIPTION) %>% 
  summarize(count = n())
## `summarise()` has grouped output by 'SUBJECT_RACE'. You can override using the
## `.groups` argument.
# Create a bar plot using ggplot2 with facet_wrap
facet_plot <- ggplot(grouped_data, aes(x = SUBJECT_DESCRIPTION, y = count, fill = SUBJECT_DESCRIPTION)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(x = "Subject Description", y = "Count", fill = "Subject Description", 
       title = "Incidents by Subject Description for Hispanic, Black, and White Subjects") +
  theme_bw() +
  facet_wrap(~SUBJECT_RACE, ncol = 1) +
  theme(axis.text.x = element_blank())
plotly_facet_plot <- ggplotly(facet_plot)
plotly_facet_plot

The illustration provides insights into the primary descriptions of subjects during their arrests. It is evident that individuals from the Black community are more likely to be reported under the description of influence of unknown drugs and mental instability. Conversely, individuals from the Hispanic and White communities are more likely to be suspected of being under the influence of alcohol.

Understanding officers in Dallas

#Visualisation 8: Count of Officers by Gender
officer_gender_counts <- data %>%
  filter(!is.na(OFFICER_GENDER)) %>%
  group_by(OFFICER_GENDER) %>%
  summarise(count = n())

# create pie chart
pie_chart <- plot_ly(
  labels = officer_gender_counts$OFFICER_GENDER,
  values = officer_gender_counts$count,
  type = "pie",
  textposition = "inside",
  textinfo = "label+percent",
  insidetextorientation = "horizontal",
  marker = list(
    colors = c("#1f77b4", "#ff7f0e", "#2ca02c"),
    line = list(color = "#FFFFFF", width = 1)
  )
) %>%
  layout(
    title = list(
      text = "Count of Officers by Gender",
      font = list(size = 20)
    ),
    margin = list(l = 60, r = 60, b = 60, t = 60),
    legend = list(
      x = 0.5, 
      y = -0.2,
      bgcolor = "white",
      bordercolor = "black",
      borderwidth = 1,
      orientation = "h"
    )
  )

pie_chart
# Visualisation 9: Officers by race

# Count the number of occurrences of each unique value in the OFFICER RACE column
officer_race_counts <- data %>% count(OFFICER_RACE)

# Define a color palette using RColorBrewer
colors <- brewer.pal(n = nrow(officer_race_counts), name = "Set2")

pie_chart <- plot_ly(officer_race_counts, labels = ~OFFICER_RACE, values = ~n, type = "pie",
                     marker = list(colors = colors)) %>%
  layout(title = list(text = "Distribution of Officer Races"))

pie_chart
#Visualisation 10: # Group data by OFFICER_RACE and calculate average tenure

# Group the data by officer race and calculate the average years on force
duration_on_force <- data %>% 
  group_by(OFFICER_RACE) %>% 
  summarise(Average_Years_On_Force = mean(as.numeric(OFFICER_YEARS_ON_FORCE)))

pie_chart <- plot_ly(duration_on_force, 
                     labels = ~OFFICER_RACE, 
                     values = ~Average_Years_On_Force, 
                     type = 'pie', 
                     hole = 0.4,
                     textposition = "inside",
                     texttemplate = "%{value:.2f}") %>%
  layout(title = "Average Years on Force by Officer Race",
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
pie_chart <- pie_chart %>% 
  add_annotations(x = 0.5, y = -0.15, 
                  text = "",
                  showarrow = FALSE, 
                  font = list(size = 14)) %>% 
  layout(legend = list(orientation = "h", 
                       y = -0.25, x = 0.5, 
                       font = list(size = 12)),
         font = list(size = 12))
pie_chart
#Visualisation 11: Crime reported by police officers for different types of subject race 

data_count <- data %>%
  filter(!is.na(OFFICER_RACE) & !is.na(SUBJECT_RACE)) %>%
  group_by(OFFICER_RACE, SUBJECT_RACE) %>%
  summarise(Count = n())
## `summarise()` has grouped output by 'OFFICER_RACE'. You can override using the
## `.groups` argument.
plot <- ggplot(data_count, aes(x = OFFICER_RACE, y = Count, fill = SUBJECT_RACE)) +
  geom_bar(stat = "identity") +
  labs(title = "Reported Crime Grouped By Subject Race", x = "Officer Race", y = "Count") +
  theme_bw()

# convert ggplot to plotly object and add hover text
plotly_obj <- ggplotly(plot, tooltip = c("OFFICER_RACE", "SUBJECT_RACE", "Count"))

plotly_obj <- plotly_obj %>% 
  layout(
    legend = list(title = "Subject Race"),
    xaxis = list(title = "Officer Race"),
    yaxis = list(title = "Count"),
    margin = list(l = 60, r = 10, b = 60, t = 30),
    title = list(text = "Reported Crime Grouped By Subject Race", x = 0.5, y = 0.95, font = list(size = 20))
  )

plotly_obj

It is worth noting that the majority of police officers are male, with a representation of 89.9%. Among the different racial groups, White officers represent the highest proportion at 61.7%, followed by Hispanic officers at 20.2%, and Black officers at 14.3%. It is important to acknowledge that while White officers report the highest number of crimes across all communities, it is not necessarily indicative of racial profiling. However, further analysis and investigation are required to determine the underlying factors that contribute to these patterns and to ensure that law enforcement is conducted in a fair and unbiased manner. In terms of the handling of cases, white officers are more likely to handle cases involving black subjects. This can be because of the more majority of white officers in the police force in comparison to other officer races.

Incidents in Dallas

#Visualisation 12: Subject Race that caused police officers hospitalisation 

hosp_plot <- ggplotly(
  ggplot(data, aes(x = OFFICER_HOSPITALIZATION, y = SUBJECT_RACE)) +
    geom_point(alpha = 0.5, color = "steelblue") +
    facet_grid(~ OFFICER_RACE) +
    labs(
      title = "Pair Plot for Officers Hospitalised in Action",
      x = "Officer Hospitalisation",
      y = "Subject Race"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(size = 16, face = "bold"),
      axis.title = element_text(size = 12, face = "bold"),
      strip.text = element_text(size = 12, face = "bold")
    )
)

hosp_plot
# Visualisation 13: Count of incidents in year 2016


# Creating an empty data frame with all dates in 2016
dates <- data.frame(Date = seq.Date(from = as.Date("2016-01-01"), to = as.Date("2016-12-31"), by = "days"))

# Counting all the incidents for each date in the dataset
incident_counts <- data %>%
  mutate(Date = mdy(INCIDENT_DATE)) %>%
  group_by(Date) %>%
  summarize(Freq = n())

incidents_by_day <- left_join(dates, incident_counts, by = "Date")
incidents_by_day$Freq[is.na(incidents_by_day$Freq)] <- 0

plot_ly(
  data = incidents_by_day,
  x = ~Date,
  y = ~weekdays(Date),
  z = ~Freq,
  type = "scatter",
  mode = "markers",
  marker = list(
    symbol = "circle",
    sizemode = "diameter",
    sizeref = 0.1,
    sizemin = 3,
    color = ~Freq,
    colorscale = list(c(0, max(incidents_by_day$Freq)), c("#FCFFDD", "#05A972")),
    showscale = TRUE,
    colorbar = list(title = "Number of Incidents")
  )
) %>%
  layout(
    title = "Police Equiting 2016",
    xaxis = list(title = NULL),
    yaxis = list(title = NULL)
  )
## Warning: 'scatter' objects don't have these attributes: 'z'
## Valid attributes include:
## 'cliponaxis', 'connectgaps', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'fill', 'fillcolor', 'fillpattern', 'groupnorm', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hoveron', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'legendgroup', 'legendgrouptitle', 'legendrank', 'line', 'marker', 'meta', 'metasrc', 'mode', 'name', 'opacity', 'orientation', 'selected', 'selectedpoints', 'showlegend', 'stackgaps', 'stackgroup', 'stream', 'text', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
# Visualisation 14: Year, Month and Hourly Incident Rates

data$INCIDENT_DATE <- as.Date(data$INCIDENT_DATE, format = "%m/%d/%Y")
data$INCIDENT_DATE <- gsub("00","20",data$INCIDENT_DATE)
data$INCIDENT_DATE <- as.Date(data$INCIDENT_DATE, format = "%Y-%m-%d")
data$INCIDENT_TIME <- format(strptime(data$INCIDENT_TIME, "%I:%M:%S %p"), "%H:%M:%S")
data$INCIDENT_MONTH <- months(as.Date(data$INCIDENT_DATE))
data$INC_MONTH <-format(data$INCIDENT_DATE,"%m")
data$INCIDENT_HOUR <- as.numeric(substr(data$INCIDENT_TIME, 0, 2))
data$INCIDENT_DAY <- wday(data$INCIDENT_DATE, label=TRUE)
data$INC_HOUR <- substr(data$INCIDENT_TIME, 0, 2)
data$INC_DATE <- substr(data$INCIDENT_DATE, 9, 10)

data_year <-  data %>%
  group_by(INCIDENT_DATE,INCIDENT_MONTH,INCIDENT_DAY) %>%
  summarize(count = n())
## `summarise()` has grouped output by 'INCIDENT_DATE', 'INCIDENT_MONTH'. You can
## override using the `.groups` argument.
data_month <-  data %>%
  group_by(INC_MONTH) %>%
  summarize(count = n())

data_day <-  data %>%
  group_by(INCIDENT_DAY,INCIDENT_HOUR) %>%
  summarize(count = n())
## `summarise()` has grouped output by 'INCIDENT_DAY'. You can override using the
## `.groups` argument.
data$INC_HOUR <- substr(data$INCIDENT_TIME, 0, 2)

data %>% group_by(INC_HOUR) %>%
  summarize(avg =n()) -> data_hour_n
plot1 <- ggplot(data = data_year, aes(INCIDENT_DATE, count)) +   geom_line(size=0.5, col="gray") +
  geom_smooth(method = "loess", color = "purple", span = 1/5) + theme_bw() + labs(x="Months ", y= "INCIDENT COUNTS", title="Year vs Incident Counts")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plot2 <- ggplot(data_month, aes(x=INC_MONTH, y =count, group=1)) + geom_line()  + geom_line( size = 1,colour ="black") + labs(x="Months of 2016", y= "INCIDENT COUNTS", title="Months vs  Incident Counts")  + theme_bw()

plot3 <- ggplot(data_hour_n, aes(x = INC_HOUR, y = avg, group = "count")) + geom_line( size = 1, colour = "magenta") + labs(x="HOURS IN A DAY", y= "INCIDENT COUNTS", title="Hours vs Incident Counts")+ theme_bw() +
  theme(axis.text.x=element_text(angle=-90, vjust=0.5)) +
  
  labs(x = "Hour of Day", y = "count") + theme_bw()

plot4 <- ggplot(data_year, aes(count)) +
  geom_density(alpha = 0.5, colour = "black", fill ="blue")+ labs(x="Incident counts", y= "Density", title="Distribution of Incident Counts") + theme_bw()


grid.arrange(plot1, plot2, plot3, plot4 , nrow=2)
## `geom_smooth()` using formula = 'y ~ x'

# Incidents by Month
ggplot(data = data, aes(x = INCIDENT_HOUR, fill = INCIDENT_MONTH, group = INCIDENT_MONTH)) +
  geom_density(alpha = 0.5, adjust = 2) +
  scale_fill_viridis_d(alpha = 0.2, begin = 0.2, end = 0.9, option = "turbo", direction = -1) +
  labs(title = "Density Plot of Incidents by Month and Hour",
       x = "Hour of the Incident",
       y = "Density",
       fill = "Month of Incident")
## Warning: Removed 10 rows containing non-finite values (`stat_density()`).

#Visualisation 15: Incident tracking through the year

# Group the data by date and count the number of incidents
crimes_rate_yrly <- data %>%
  group_by(INCIDENT_DATE) %>%
  summarise(Total_Crimes = n())

crimes_rate_yrly$INCIDENT_DATE <- as.Date(crimes_rate_yrly$INCIDENT_DATE, format =  "%m/%d/%Y")

names(crimes_rate_yrly)[names(crimes_rate_yrly) == "INCIDENT_DATE"] <- "Date_Happened"

line_chart <- ggplot(crimes_rate_yrly, aes(x = Date_Happened, y = Total_Crimes)) +
  geom_line(color = "red") +
  geom_point(color = "red") +
  labs(x = "Date", y = "Incident Count", title = "Incidents Tracking Through the Year") +
  theme_minimal()

plot <- ggplotly(line_chart, tooltip = c("Total_Crimes"))

plot
#Visualisation 16: Incidents Counts Across Days and Months

# Group the data by day and month and count the number of incidents
df_monthday <-  data %>%
  group_by(INCIDENT_DAY, INC_MONTH) %>%
  summarize(count = n())
## `summarise()` has grouped output by 'INCIDENT_DAY'. You can override using the
## `.groups` argument.
df_monthday <- df_monthday[complete.cases(df_monthday), ]

ggplot(df_monthday, aes(x= INCIDENT_DAY, y=INC_MONTH,fill = count)) + geom_tile( ) + 
  geom_text(aes(INCIDENT_DAY, INC_MONTH, label = count), color = "black", size = 4) + scale_y_discrete("Months",labels=c("January","February", "March", "April","May", "June","July","August", "September","October","November","December")) + labs(x="Days of Month", y= "Months", title=" Incident Counts across Days and Months")+
  scale_fill_gradientn(colours = c("#3794bf", "#FFFFFF", "#df8640"))

Overall, there was a decrease in the number of incidents reported throughout the year, with February and March having the highest numbers and December the lowest. During the day, there were two peaks of incidents reported: one between 5 pm and 8 pm, and another smaller peak at 2 am. The lowest incidents occurred between 4 am and 10 am. The distribution of incident counts across the year shows a right skewness, with most days having fewer than 24 incidents reported. However, there were two peaks of incidents reported in several months, with normal distributions observed in March, June, July, January, August, and April, and left skewness observed in September. Incident counts also vary across different days of the week, with Sundays displaying a normal distribution of incidents. The average range of incidents reported per day and month falls between 3 to 20. These patterns and trends provide valuable insights for law enforcement agencies to allocate resources effectively and improve their response to incidents.

#Visualisation 17: Each type of force used for each subject race 
data <- data %>% rename( F1 = TYPE_OF_FORCE_USED1, F2 = TYPE_OF_FORCE_USED2, F3 = TYPE_OF_FORCE_USED3, F4 = TYPE_OF_FORCE_USED4, F5 = TYPE_OF_FORCE_USED5, F6 = TYPE_OF_FORCE_USED6, F7 = TYPE_OF_FORCE_USED7, F8 = TYPE_OF_FORCE_USED8, F9 = TYPE_OF_FORCE_USED9, F10 = TYPE_OF_FORCE_USED10)
df_grouped <- data %>% 
  filter(SUBJECT_RACE %in% c("White", "Black", "Hispanic")) %>%
  group_by(SUBJECT_RACE, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10 ) %>% 
  summarise(count = n()) %>% 
  pivot_longer(cols = F1:F10, names_to = "Force_Type", values_to = "Count") %>%
  filter(Count > 0)
## `summarise()` has grouped output by 'SUBJECT_RACE', 'F1', 'F2', 'F3', 'F4',
## 'F5', 'F6', 'F7', 'F8', 'F9'. You can override using the `.groups` argument.
plot <- ggplot(df_grouped, aes(x = Force_Type, y = Count, fill = SUBJECT_RACE)) + 
  geom_col(position = "dodge") + 
  facet_wrap(~ SUBJECT_RACE, scales = "free_x") +
  labs(title = "Each Type of Force Used for Each Subject Race", x = "Type of Force Used", y = "Count", fill = "Race") + 
  theme(legend.position = "bottom") 

ggplotly(plot, tooltip = c("Force_Type", "Count", "SUBJECT_RACE"))

This plot shows the count of each type of force used for different subject races. The x-axis represents the type of force used, while the y-axis represents the count of that type of force. The bars are colored according to the subject race, and the plot is split into different facets for each race. It clearly shows that all the major subject race are prone to different measures of the use of force for arrests.

# Visualisation 18: Group the data by officer race and use of force
df_grouped <- data %>% 
  group_by(OFFICER_RACE, REASON_FOR_FORCE) %>% 
  summarise(count = n())
## `summarise()` has grouped output by 'OFFICER_RACE'. You can override using the
## `.groups` argument.
# created stacked bar chart
ggplot(df_grouped, aes(x = OFFICER_RACE, y = count, fill = REASON_FOR_FORCE)) +
  geom_bar(stat = "identity") +
  labs(x = "Officer Race", y = "Count", fill = "Reason for Force") +
  theme_minimal()

Incidents based on Dallas Geography

# Contingency table to see how many incidents happened in which division of Dallas and by which race
table(data$DIVISION, data$SUBJECT_RACE)
##                
##                 American Ind Asian Black Hispanic Other Records Missing White
##   CENTRAL                  0     4   302       93     3              14   147
##   NORTH CENTRAL            0     0   117       67     3               5   127
##   NORTHEAST                0     1   223       60     0               1    56
##   NORTHWEST                0     0    54       76     5               2    54
##   SOUTH CENTRAL            0     0   269       25     0               2    14
##   SOUTHEAST                0     0   233       76     0               7    46
##   SOUTHWEST                1     0   135      127     0               8    26

This table presents a summary of the distribution of individuals across different racial/ethnic groups in seven different regions. The data suggests that the highest count of Black individuals is in the Central region, where there are 302 recorded instances. Similarly, the South Central region has the highest count of Hispanic individuals with 25 records, while the Southwest region has the highest count of White individuals with 26 records. Interestingly, the Northeast region has the highest count of Asian individuals with only one more individual compared to the Central region. Conversely, only one American Indian individual is recorded in the Southwest region. The “Other” racial/ethnic category has a total count of 18 individuals across all regions. Moreover, the data indicates that the highest number of missing records is in the Central region with 14.

#Visualisation 19: Incidents by Division and Month

# Summarize the data
division_counts <- data %>%
  group_by(INC_MONTH, DIVISION) %>%
  summarize(count = n())
## `summarise()` has grouped output by 'INC_MONTH'. You can override using the
## `.groups` argument.
# Plot the data
ggplot(division_counts, aes(x = DIVISION, y = INC_MONTH, fill = count)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "red") +
  geom_text(aes(label = count), color = "black", size = 3) +
  labs(x = "Division", y = "Month", title = "Incidents by Division and Month")

# summarize the data by month and division
df_div <- data %>%
  group_by(INC_MONTH, DIVISION) %>%
  summarize(count = n())
## `summarise()` has grouped output by 'INC_MONTH'. You can override using the
## `.groups` argument.
# Incidents by division and month
ggplot(df_div, aes(x = INC_MONTH, y = count, group = DIVISION, color = DIVISION)) +
  geom_line(size = 1.2) +
  facet_wrap(~ DIVISION, ncol = 4) +
  scale_color_brewer(palette = "Set1") +
  theme(legend.position = "bottom") +
  labs(x = "Month", y = "Incident Counts", 
       title = "Incidents by Division and Month",
       subtitle = "Trends of incident counts by division over time",
       color = "Division") 

#Incidents in different divisions overtime
division_incidents <- as.data.frame(table(DIVISION, INCIDENT_DATE))
division_incidents$INCIDENT_DATE <- as.Date(division_incidents$INCIDENT_DATE, format =  "%m/%d/%Y")
names(division_incidents)[names(division_incidents) == "INCIDENT_DATE"] <- "Date_Happened"
names(division_incidents)[names(division_incidents) == "Freq"] <- "Total_Crimes"

division_incidents <- ggplot(division_incidents, aes(x = Date_Happened, y = Total_Crimes))  + 
  geom_line(color = "black") +
  facet_wrap(~DIVISION,ncol=2) +
  theme(legend.position="none") + 
  labs(x="", y= "", title="Incidents frequency in different divisions over time")

ggplotly(division_incidents, tooltip = c("Total_Crimes"))
#Visualisation 20: 
division_incidents_count <- data.frame(table(LOCATION_LONGITUDE, LOCATION_LATITUDE, DIVISION)) %>% filter(Freq != 0)
names(division_incidents_count)[names(division_incidents_count) == "Freq"] <- "Total_Crimes"
division_incidents_count <- division_incidents_count[-1,]

library(leaflet)
library(leaflet.extras)

leaflet() %>% 
  addProviderTiles("Stamen.TonerLite") %>% 
  addCircleMarkers(data=division_incidents_count, 
                   lat = as.numeric(LOCATION_LATITUDE), 
                   lng = as.numeric(LOCATION_LONGITUDE), 
                   radius = ~Total_Crimes/100,
                   fillColor = 'darkBlue', 
                   fillOpacity = 0.6,
                   stroke = FALSE,
                   label = ~Total_Crimes,
                   labelOptions = labelOptions(
                     noHide = TRUE,
                     direction = 'auto')
  ) %>%
  addHeatmap(
    data = division_incidents_count,
    lng = as.numeric(LOCATION_LONGITUDE),
    lat = as.numeric(LOCATION_LATITUDE),
    radius = 20, 
    blur = 15
  ) %>%
  addLegend(
    position = 'bottomright',
    pal = colorNumeric(palette = 'Blues', domain = division_incidents_count$Total_Crimes),
    values = division_incidents_count$Total_Crimes/100,
    title = 'Total Crimes',
    opacity = 0.6
  )
## Warning in pal(c(r[1], cuts, r[2])): Some values were outside the color scale
## and will be treated as NA
## Warning in validateCoords(lng, lat, funcName): Data contains 55 rows with either
## missing or invalid lat/lon values and will be ignored

## Warning in validateCoords(lng, lat, funcName): Data contains 55 rows with either
## missing or invalid lat/lon values and will be ignored
# Visualisation 21: Officers race by division

# summarize the data by division and officer race
df_race_division <- data %>%
  group_by(DIVISION, OFFICER_RACE) %>%
  summarize(count = n()) 
## `summarise()` has grouped output by 'DIVISION'. You can override using the
## `.groups` argument.
# create the pie chart
ggplot(df_race_division, aes(x = "", y = count, fill = OFFICER_RACE)) +
  geom_bar(stat = "identity", width = 1) +
  facet_wrap(~ DIVISION, ncol = 3) +
  labs(x = "", y = "", 
       title = "Officer Race by Division",
       fill = "Officer Race") +
  coord_polar(theta = "y") +
  theme_void() +
  theme(legend.position = "bottom")

According to the data, the Central Division has reported the highest number of crimes with more than 40 incidents. In contrast, the Northwest Division has the lowest incident rates and the number of incidents has been decreasing over time. The South Central, Southeast, and Southwest Divisions have also shown a sharp decline in incidents over time. Meanwhile, the Northeast and North Central Divisions have shown a neutral trend. Towards the end of the year, almost all other divisions exhibit a decrease in incidents, and December has the lowest number of incidents in all Divisions except for Northeast. The Northeast Division has a constant and steady intermittent trend.

With the officer distribution in the different locations, it can be made certain that the assumption for racial profiling cannot be justified. There is need for more data-points to ensure that it can be one of the problems for Black community being reported for crime more than any other community in Dallas. The most evident thing that can be ensured from the above analysis is that there is an immediate need for localising crime and making policies to curb the crimes committed by the different communities at different locations.

References

  1. https://www.kaggle.com/datasets/center-for-policing-equity/data-science-for-good/discussion/74416
  2. https://www.kaggle.com/code/araraonline/cpe-helper/notebook
  3. https://plotly.com/r/pie-charts/
  4. https://www.rdocumentation.org)
  5. https://rstudio-pubs-static.s3.amazonaws.com/170893_9e4e88b0dedc4adfa3bd5a87eb64a9ba.html
  6. https://stats.stackexchange.com/questions/tagged/data-visualization